Cream of the Crop 26

home *** CD-ROM | disk | FTP | other *** search

/ Cream of the Crop 26 / Cream of the Crop 26.iso / program / ccdl150e.zip / CC386.DOC < prev next >

Wrap

Text File | 1997-06-14 | 43KB | 1,267 lines

1.1) General introduction 1.2) Liability and authorship 1.3) General setup issues 1.4) Command line switches 1.5) Credits 2.1) ANSI compatability 2.2) Missing features 2.3) C++ style features 2.4) Known bugs 3.1) Standard run-time libraries. 3.2) DOS libraries 4.1) Implementation-dependent keywords 4.2) Implementation-dependent preprocessor functions 4.3) Phi-text compatability 5.1) Errors 6.1) Stack frames 6.2) ASM interface 6.3) Segmentation 6.4) Optimizations 7.1) Description of directory tree 7.2) Porting 1.1) General introduction CC386 is a generic 386 DOS C compiler. Every effort has been made to have it recognize standard ANSI syntax; however it should NOT be expected to produce code conforming to the ansi standards, especially in regard to floating point. CC386 outputs assembly language code suitable for TASM and NASM, and possibly it will work with MASM. This package includes various support programs and libraries required to build code that will run under an MSDOS DPMI server. TRAN's PMODE is used for the server; so if there is no memory management software programs generated by this package will still work. You need TASM and TLINK to use this package. You can get by with WLINK; however TASM is still a requirement. The compiler itself will generate NASM compatabile code, however this package is not sufficient to have that code actually run under DOS. This package consists of the compiler, some borland DPMI stubs currently needed for the compiler, run-time libraries for DOS, and header files. Seperate packages have the source for the compiler and the run-time libraries. Additionally, the package includes a program 'CL386' which will call the compiler, TASM, and TLINK to create programs for you. Warning: you must have a version of TASM earlier than version 4.1; TASM 4.1 has some bugs. The code was tested with 4.0; it will probably work with 3.0 and maybe even 2.0. Part of the run-time libraries is a debug-style debugger that can be linked into the code. Note that this debugger will NOT work inside a win95 DOS box; however the rest of the package will. New features and bug fixes: revamp of floating point to make it at least work a little. Now that I have a coprocessor :). exception handling that works, including a floating point exception handler that will catch it if you use floating point when there is no coprocessor. revamp of all the implicit cast operations to make them work properly fix to code generation for pointers to functions fix to the bit-size variables used in structures inline assembler recognizes ALL 486 opcodes (32-bit addressing modes only) various fixes to the debugger addition of SPAWN functions to the run-time library This program IS capable of compiling itself and having the image run; I do not distribute this latter image because it will not run on a 386 when there is no FP coprocessor. 1.2) Liability and authorship This compiler is presented on an 'as-is' basis without any guarantee of usability or fitness for any given application. Risks associated with using it, including financial loss or loss of life are not the responsibility of the authors. However, this compiler is intended as an educational tool, and is not to be used commercially in any case without the express written consent of the authors. The original author is Matthew Brandt. As he left it it was a K&R style compiler with no floating point and minimal preprocessor support, targeted only for the m68k. Much of the work was done on a Unix machine and later ported to DOS. You can find his version on one of the Motorola file sites if you wish to compare. The current version has been updated extensively to support a variety of ANSI constructs as well as i386 support and a better preprocessor. However, parts of the program still reflect Mathew's work. I have done my part of the coding with 16 and 32 bit MSDOS compilers. This version of the code has NO dos-specific features in it and should be portable to any 16 or 32 bit ANSI compiler. 1.3) General setup issues To install the version which creates DOS executables, run the install.bat file. You need to give it a file name: install C: will install the necessary files on drive C:. A directory tree will be made under \CC386 and all necessary files will be copied there. Read the file INTRO.DOS for a brief overview of how to create programs for DOS. The only thing really needed for the compileer to work is a pointer to the include directories. These may be specified in the environmnt variable 'CCINCL' or with the /I command line switch. I normally set CCINCL=\cc386\include to get at the ansi headers and then use /I if any other directories are required. If you use the install.bat program CL386 will take care of the include and library issues via its configuration file and there is no other setup required other than to run the install program. 1.4) Command line switches Switches prefixed with a '+' or '-' may be turned on orr off. The last occurance of the switch determines the state. For these switches '/' is equivalent to '-'. Note that codegen parameters must generally be the same for all modules in a program, or unpredictable results will occur. +e - make error file +i - make preprocessed file default is -i /ffile - process arguments in file 'file' +l - make LST file default is -l /w-all - no warnings, errors only warnings may also be suppressed individually. See ERROR.DOC -A - disable ANSI compatability and enable some non-standard features default is +A /C - codegen params /C+d - display internal diagnostics /C-b - no BSS /C-l - don't put C source in the ASM file /C-m - don't mangle symbols with a leading underscore /C+p - pack variables for space. On a 68020+ minimize at word alignment /C+r - reverse order of bit ops /C+F - (386) force the TASM .MODEL directive to use FLAT mode This may become the default in some future release. /C+N - generate NASM code /C-R - use stack pointer rather than link registers default is /C+blmR-prFN /Dxxx - define a macro 'xxx' /E## - max number of errors to generate /Idirs - specify include directories. use a semicolon to seperate multiple directory specifications. The directories specified by the environment variable "CCINCL" are always searched first. /O - Optimizer params /O-Rxxx Turn off register optimizations. In place of the xxx put any combination of: a - turn off address rigister optimizations f - turn off floating point register optimizations d - turn off data register optimizations default is all register optimizations enabled +S - reserved, no use (yet) default is -S Compiler will look for the symbol CC386 (or CC68K) in the environment. If it finds it, it will evaluate any command line arguments in it prior to evaluating the command line. Note that command line parameters will override the environment variable; in particular specifying a search path both in the environment var and on the command line will result in loss of the search-path environment. There is an alternate environment variable CCINCL which specifies include paths which will be appended to the command line specification. 1.5) Credits The following people contributed source code to this program. Matthew Brandt: original K&R C compiler Thomas Pytel (TRAN): DPMI extender for DOS Kirill Joss: CL386 compiler shell David Lindauer: Ansification, preprocessor, run-time libraries, 386 code gen, miscellaneous enhancements to original compiler Many people were instrumental in locating bugs, I'd like to acknowledge two who were especially helpful with *lots* of testing: Johann Klockars Kirill Joss And thanks to David Gurevich and Kirill Joss for helpful suggestions on packaging. 2.1) ANSI compatability This compiler is meant to be ANSI compatible at the source level. However I have never seen the ANSI documentation for what that means; If you find something it doesn't do, let me know! However, there is no guarantee that code generated will meet ANSI runtime requirements in terms of evaluation ordering, especially with casts. floating point is done using the host coprocessor , and is NOT adjusted for ANSI/IEEE compatability. The run-time library is designed to act like an ANSI library; however the internals are most likely somewhat different. Especially since I did away with static buffers where possible. 2.2) Missing features The following are known to be missing: a) libraries don't handle any kind of floating point b) expressions of the form: (T) are not handled correctly when T is a typedef 2.3) C++ style features The C compiler has some rudimentary C++ support. It recognizes: 1) Overloaded functions (but not the overload keyword) 2) Variable declarations anywhere 3) Reference variables 4) Function parameter defaults 5) Stricter type checking 6) Improved init of static pointers and reference variables 7) Detailed C++ error messages classes and C++ keywords aren't yet supported. To enable these features use the extension .CPP on your input file 2.4) Known bugs The following known bugs exist: a) Expression evaluation is recursive. With a 4K compiler stack the limit is approximately something like: a = (b()+(c()+(d()+(e()+f())))); Beyond this unpredictable results will occur. Raise the stack limit or rearrange the expression with higher order parenthesis to the left. Notice this would not be a problem without the grouping parenthesis because the compiler wouldn't have to maintain so many contexts. I compile the compiler with a 20K stack b) Floating point may or may not work. A floating point library will be added later and this will be checked out d) expressions such as : a = b = c; may not return the correct value to anything other than the rightmost assignment. In general it will work, but if there are multiple implicit casts going on from one assigment to the next it may not work correctly. e) % may not work properly for signed divisions. The sign may be wrong but the value will be correct. This may or may not be a problem, I haven't analyzed it. f) long and unsigned constants will not be optimized or evaluated correctly when there are two or more of them in an expression (type may not propogate) g) The identifier 'pascal' is a standard keyword rather than something a user may redefine. 3.1) Standard run-time libraries. libs were implemented according to 'The Waite Group's Essential Guide to ANSI C'- ISbN 0-672-22673-1. My copy is circa 1989. Floating point library functions are not supported at this time, as I have no way to test them. This includes things like atof and difftime as well as most of the math libraries. All the functions in this book were implmented except floating point. However, process control stuff is kind of sketchy at this time. Most of the IO library, part of the time library, and the malloc library functions require operating system support. Documentation for this is provided with the run-time library sources; however this package contains sufficient code to use DPMI as the operating system. There may be a variety of cases where things don't work as expected. For example scanf will only read one line no matter what... when a function such as strftime requires a buffer length to be given the results are undefined if the text length exceeds the buffer length. Also I just found out the opening a file with the 'a' attribute is supposed to override any attempt to set the position for write in the file... in this implementation all it does is position to the end of the file at open time. The libraries were originally designed in a reentrant fashion; however this breaks much standard code and the version of the libraries included here has static buffers where called for. ERRNO isn't supported at this time. Many of the library functions depend on having the startup/rundown code included. This code initializes a few global variables and executes any startup/rundown functions the libraries need for initialization and cleanup. To use the libraries two files must be included in your link. First is the startup module (c0dos or c0dosd) and second is the library itself (cldos). An example build if _main is defined in q.c: cc386 q.c tasm /ml /m2 q.c tlink c0dos q,q,q,cldos will build q.exe. Note that the startup module MUST be the first object module specified as it defines the segmentation setup required. Two startup modules are provided; c0dos is a standard C startup module. c0dosd is the same module but it will draw in a debugger from the library (approximately 16K) and call it rather than execute your code. The debugger is somewhat similar to DEBUG. When the debugger starts up the EIP and registers will be set to the values they would have at the beginning of your _main function. The debugger traps several exceptions including int 3 so you can put int 3 in your code at places you want to debug. Warning! The debugger will NOT work in a DOS box in windows 95, as I could not get appropriate access to exceptions. The startup modules use TRAN's PMODE to manage pmode resources. I use version 3.07... I had to modify the class names in his segment declarations to make them different from the 32-bit code segments but other than that they are his release. I have included the sources as per his licensing in msdos\pmode307. I manage several exceptions; traps 6,13, and 14 are all routed through the signal-handling code; by default the print general protection fault and exit but you can trap them using the signal mechanism if you want. Unless you are in a DOS box... likewise traps 7, 8, and 16 relating to floating point are routed through the signal handling code unless you are in a DOS box. The default signal handling code just prints a message and jumps to the program exit point... I also manage ctrl-c interrupt from DOS (but not ctrl-brk from BIOS, that is handled via the DOS interrupt) and exit the program cleanly if ctrl-c is pressed. Note that ctrl-c will even exit the debugger! I should probably fix that... 4.1) Implementation-dependent keywords a) The following implementation-dependent keywords have been added i386 use _interrupt Generate a function which may be used as a trap/interrupt. _genbyte Generate data in the code segment _absolute Allocate a global variable at an absolute address. Such variables will be directly addressable. pascal force the function declaration to use pascal calling conventions. b) The following implementation variables have been added. these variables directly access the assembly language registers they name. Note they should be used with caution and may change periodically at the compiler's discretion. Also, casts of them or assignements to them may change the machine state functionally... and wreck the code the compiler has generated. i386 _EAX _EBX _ECX _EDX _ESP _EBP _ESI _EDI the 386 compiler also knows the keyword 'asm' which is an escape to allow inline assembly. The syntax is: asm my_instruction; or asm { my list of instructions; } The compiler catches most errors in inline assembly code at compile time. It will also translate the names of local variables into proper stack-based addressing modes. 4.2) Implementation-dependent preprocessor functions the preprocessor is more-or less ansi compatible. The following #pragma statements are supported: #pragma regopt xxx now obsolete #pragma startup xxx # xxx may be any function name # may be a priority value from 20 - 90 (other values are used by the run-time library) Higher priority functions get run first. This option tells the compiler to inform the startup routines that this function should be run prior to calling main. #pragma rundown xxx # xxx may be any functionn name # may be an integer value from 20-90 (other values are used by the run-time library). Higher priority functions get run first. This option tells the compiler to inform the startup routines that this function should be run after main exits. The following macros are predefined: _i386_ (386 only) compiler is generating 386 code _m68k_ (68k only) compiler is generating 68K code __cplusplus if the compiler is allowing C++ extensions __FILE__ the file name of the source file as a string __DATE__ The date as a string __TIME__ The time as a string __LINE__ the line number as a number #if macros can use defined(xxx) to determine if a macro is defined. 4.3) Phi-text compatability This compiler is capable of understanding 'phi-text' which is an extended text-based character set. It is somewhat preferable to UNICODE for western programmers as it does not encompass thousands of characters that are little used by main-stream westerners. Phi text is a banked character set. Each character in its full form is 32-bits; this encompasses the following information: cwb: a number from 32 to 127 describing the character bank: a bank from 0 to 15. BANK 0 is the ASCII character set with some modifications to control characters. basic Attributes: BOLD, UNDERLINE, ITALIC, HIDDEN, and REVERSED attributes. BLINKING may be substituted for ITALIC however we normally use ITALIC. color: 16 color renditions. The colors have been chosen to reflect complementary colors. Foreground and background may be specified for each character. size 16 size attribute font: 16 font attribute 32 bits per character is a bit much for some applications; an application may elect to ignore certain fields. This compiler ignores ALL fields but the bank and the CWB (although it may look briefly at attributes, I don't remember). Internally, the characters are thus represented with 16-bit fields in this compiler. To ease the storage requirements of such a character set, there exists a 'streamed' form of phi-text. This takes advantage of the notion that attributes are not likely to change as rapidly as the character information. Basically, if the high bit of a streamed byte is set it indicates that control information is embedded which indicates the new attributes. There is also a 'repeat' code so that long strings of repeating characters (for example spaces) get packed together. This is not quite as efficient as tabbing but in the long run it works out better because there are several situations where such strings may occur in phi-text and they do NOT always involve spaces. In this compiler, the incoming text is in streamed format (which defaults to ASCII unless an appropriate editor is used). The streamed format is converted to a flat format and all information is stripped except that essential to detecting the character. Preprocessing is done on the flat version... but when the scanner starts looking for tokens it then converts the flat version back to stream (minus colors and attributes) for more effeciency in the parser and back end. If the source file is streamed phi-text the list and assembly files will also be streamed phi-text; color information is added to the list file just to make it a little flashy, although I have a monochrome monitor so the colors I picked may be awkward. One problem exists with streamed formats: in case of an error situation it is possible to lose important synchronization and so wreck more than a single character. For this reason streamed phi-text is partially synchronizing; at the beginning of each line all attribute information defaults back to a standard default. In this way one never loses more than a complete line in the presence of simple errors. And even at that a smart editor could be designed to help one recover from such simple errors... provided that errors occurred often enough to be worth the effort. We have seen that phi-text is composed of 16 banks with 96 characters per bank, for a total of 1536 characters. The first bank is pure ASCII, with a few modifications, but what are the other banks? About half of them are currently unused. Of those some have been deliberately reserved for application speccific and system-specific uses by the designer of phi-text. The defined characters can be broken roughly into the following groups: 1) ASCII characters 2) European extensions (accented characters) 3) greek characters 4) cyrillic characters 5) line drawing characters 6) mathematics characters 7) miscellaneous characters While it IS possible to extend certain C operators with more compact character representations in a compiler like this one, use of phi-text has been limited to allowing greek and cyrillic characters in variable names, and to allowing things in boxes to be treated as comments. The primary editor for phi-text has an extension that allows usage of the arrow keys to draw lines on the screen and this makes beautifying code a snap. For more information about phi-text, contact: Paul McKneely P.O. BOX 5641 Pasadena, TX, 77508 email: gecko@onramp.net 5.1) Errors This is a list of possible errors. There are two types of errors... 'Errors' and 'Warnings'. an 'Error' signifies an event which the compiler cannot handle, whereas a 'warning' is a diagnostic which indicates that something is possibly wrong but the compiler will make assumptions about it. This list is slightly outdated; it is missing new errors which the inline assembler can generate. Each 'warning' will have a value in parenthesis, this value may be used on the command line to supress the warning. the value 'all' may be used to supress all warnings. Errors may not be suppressed. Example cc -w-ieq a.c ; Suppress the 'Possibly incorrect assignment' warning. cc -w-all a.c ; Suppress ALL warnings Some of these errors result when the compiler is in C++ mode. Error: _int keyword not allowed in Pascal declarations Pascal declarations may not be used as traps or interrupts. Error: Ambiguity between %s and %s C++. Compiler cannot choose between two almost equivalent overloaded definitions. Warning: ('cln') Argument list too long %s Argument list for the function call specified is too long. Compiler ignores the extra args. Error: Argument list too long in redeclaration of function '%s' A prototyped function has been redeclared with a different argument list Error: Argument list too short %s Too few parameters have been supplied in a function call. Error: Argument list too short in redeclaration of function '%s' A prototyped function has been redeclared with a different argument list Error: Bit field must be signed or unsigned int ANSI C requires a bit field to be of one of these types. If extensions are allowed bit fields can be of any integer type. Error: Bit field only allowed on scalar types Bit fields can only be used on integral types. This error will occur if in non-ansi mode and you use any non-integral type as the basis for a bit field. Error: Bit field too big Bit fields must fit within the processor word size. Warning: ('pro') Call to function '%s' with no prototype A function call has been made to a function that has not been previously declared. Compiler guesses at argument types. Error: Cannot cast %s C++. Some casting of classes is not allowed. Error: Cannot define a pointer or reference to a reference C++. Reference variables are treated specially in this regard Error: Cannot initialize '%s' An error occurred while trying to process a variable initialization Error: Cannot modify a const val a CONST value may not be modified Error: Cannot open file \"%s\" for read access An include file was not found Error: Cannot overload 'main' C++. main() must not be overloaded Error: Cannot take address of bit field Pointers to bit fields not allowed Error: Cannot use bit field as a non-member Only structure members may have a bit field qualifier. Warning: ('cno') Code has no effect This line of code compiled to nothing Error: Constant value expected In general initializers must be constant values. Some others must as well Error: Constructor/destructor must be untyped C++ can't type constructors/destructors Error: Continue not allowed Not in scope where a continue makes sense Warning: ('cnv') Conversion may truncate ignificant digits An implicit cast may result in loss of significant digits. This warning is NOT produced for explicit casts. Error: Could not find a match for '%s' C++. This function call is not prototyped either directly or with an overload or defaulted function prototype Warning: ('dpc') Dangerous pointer cast If you get this, it will happen when the size of the pointer is not the same as the size of the (scalar) type youy are using with it. Error: Declaration expected Parser got a statment or other value when it was expecting a declration. Error: Declaration not allowed here Parser found a declaration when it was expecting a statement Error: Default missing after parameter '%s' C++... this parameter was assumed to have a default which is missing. Error: Destructor for class '%s' expected C++. A destructor was expected. Error: Duplicate case %d Two case statements evaluate to the same value Error: Duplicate label '%s' The label occurs twice in the same procedure. Error: Duplicate symbol '%s' The symbol is being redefined. Error: Ellipse (...) not allowed in Pascal declarations Pascal-style declarations may not have variable arguments. Error: Expected '%c' The compiler expected a specific character or token. Error: Expression expected The compiler was ready to parse an expression but found something else Error: File ended with comment in progress Comments must have an ending point within the same file or include file. Error: File name expected in #include directive #include directive must have a file name Error: Function declaration not allowed here A function declaration was attempted in an invalid place, for example inside a structure or inside another function. Warning: ('ret') Function should return a value This error occurs when a function is not of type 'void' and you exit without returning a value. Error: Identifier expected The parser was expecting a variable/function name. Error: Illegal call to main() from within program C++. C++ programs may not call main() Error: Illegal character '%c' The parser detected an illegal character sequence. Error: Illegal pointer An attempt was made to use a non-pointer in a pointer context Error: Illegal pure declaration syntzx of '%s' C++. Virtual declaration syntax is wrong Warning: ('irg') Illegal register var '%s' the size of the variable was too big for it to fit in a register Error: Illegal storage class specifier '%s' Conflicting or illegal specifier on a declaration. Error: Illegal storage class specifier on '%s' Conflicting or illegal specifier on a declaration. Error: Illegal typedef of '%s' Attempt to reuse a symbol name as a typedef. Error: Illegal use of reference operator Attempt to use '&' in a context where it is not permitted. Error: Illegal use of void pointer Cannot take the size of a void pointer. Error: Inserted '%c' The parser guessed at a symbol to insert. Error: Invalid '&' on register var '%s' Cannot take the address of a register Error: Invalid floating point Cannot use floating point in certain types of math functions (e.g. logic functions) Error: Invalid preprocessor directive '%s' Preprocessor directive is unknown Error: Invalid trap id CPU-specific. Indicates a cpu operation (int or trap) was called with an identifier that is too large Error: '%s' is not a function Cannot call non-functions. Error: '%s' is not a label Cannot jump to non-labels. Error: Local class functions not supported C++. Cannot support class definitions as local variables Error: Local variables may not be used as parameter defaults C++ .Paremeter defaults must be in scope prior to calling the function Warning: ('lli') long long int type not supported, defaulting to long int long long int type will parse correctly but it is unsupported Error: Lvalue expected cannot assign to the address of a variable Error: Macro substitution error Macro expansions are limited to 4096 characters Error: Misplaced else Unexpected else found in input stream Error: '%s' must be a predefined class or struct C++. Cannot work with this structure/class because it has not been fully defined. Warning: ('zer') No memory allocated for '%s' an unsized array has no initializers either. Warning: ('nsf') Nonexistant static func '%s' a static function was prototyped but never declared Warning: ('npo') Nonportable pointer conversion An implicit pointer conversion may result in code that compiles incorrectly with other C compilers Error: Non-scalar array index Array indexes must be of integral type Error: Numeric constant is too large an integer or hex constant was too large for the base type, or the non-fractional part of a floating-point number could not fit in a long-integer. Error: Pointer type expected A pointer was expected. Warning: '(ieq') Possible incorrect assignment the symbol '=' was used at the outer scope in an if statement expression. This could be intended, but often is a mistype of the symbol '==' so the compiler warns you. Warning: ('san') Possible superfluous & & isn't needed when taking the address of an array. This is a junk message; ansi C doesn't care either way. Warning: ('sud') Possible use of '%s' before assignment A variable has been used but it possibly has not been initialized with a value Error: Reference initialization needs lvalue C++. Reference syntax calls for something whose address can be taken. Error: Reference member '%s' in a class with no constructors C++. The reference variable cannot be initted at class startup because the constructor is supposed to do it/ Error: Reference variable '%s' must be initialized C++. Cannot change what a reference variable equates to at run-time. Error: Return type is void Attempt to return a value from a void function Error: Size is unknown or zero Attempt to use the size of a variable with a type that has been forward declared. Error: Size of '%s' is unknown or zero Attempt to use the size of a variable with a type that has been forward declared. Error: Startup/rundown function '%s' is unknown or not a function A function named in the '#pragma startup' or '#pragma rundown' is either not a function or is not defined Error: String constant too long A multi-line string is too long. Warning: ('spc') Suspicious pointer conversion A pointer operation is being performed on pointers which have different base types. Warning: ('fun') Static function '%s' is declared but never used This static function is just a space waster. Warning: ('sud') Static variable '%s' is declared but never used This static variable is just a space waster. Warning: '(fsu) Structure '%s' is undefined The compile completed with a structure whose type was never defined. Error: Switch argument must be of integral type Switch arguments must be integers. Warning: ('tua') Temporary used for parameter %s C++. A constant was passed in a reference parameter and the compiler automatically made a variable so the called function would be happy. Warning: ('tui') Temporary used to initialize %s The reference variable is initted with a constant; extra storage had to be created for it. Error: Too many initializers A structure/array has too many initializers Error: Type expected in sizeof sizeof argument was not a type or variable Error: Type mismatch Generic type mismatch Error: Type mismatch in arg '%s' type mismatch for function calls Error: Type mismatch in redeclaration of '%s' A variable has been redeclared with a different type from before. Error: Type mismatch in return The value being returned does not match the function type. Error: Unbalanced preprocessor directives #if- #endif directives were not balanced. Warning: Undefined label '%s' The label should appear somewhere as there is a goto to it. Error: Undefined symbol '%s' This is an unknown symbol Error: Unexpected '%s' This keyword was unexpected. Warning: (' urc') Unreachable code Code stream can never get here. Warning: ('lun') Unused label '%s' A label was declared but never used/ Error: User error: %s #Error directive results in this Warning: ('sas') Variable '%s' is assigned a value which is never used After assignment to the var, there is no subsequent use Error: Variable '%s' cannot have a type qualifier C++. ??? Error: Variable '%s' is not a class instance C++. A class instance was expected. Warning: ('sun') Variable '%s' is declared but never used This variable was declared but nothing ever referenced it. Space waster. 6.1) Stack frames There are a variety of options for stack frames. a) Standard C-style stack frames. An index register (EBP or A6) is used to point at a value between the paramenters and the function local variables; all local variables and function parameters are indexed from this base register. This is the default. b) The compiler can free the link register and index all local variables and function paremeters off the stack pointer c) (68K only) parameters lists may be located anywhere in memory; the parameter list pointer is passed in A0. A0 is then transfered to A6 and parameters are indexed off A6. Meanwhile local variables are indexed from the stack pointer. On the 68K, several codegen options are available. By default the 68K compiler generates PIC code based around a 32K memory model. A 68020 mode is available for speciying extended 68020 features such as enhanced addressing modes and specialized instructions. Another option is to generate 68000 code in such a way that a data section greater than 32K can be used. The final option allows one to disable PIC mode and generate code that will be placed at absolute addresses in memory. 386 code is fairly straightforward. It is a little more complex than need be because of the need to use special function registers for some operations like multiplies and shifts. 386 Code will be a little bulky because of this need. 68K code is position-independent. All global data is accessed off of register A5 or A6; function arguments are indexed off of A6 or A7; and the stack is indexed off A7. String constants are indexed off the PC. Because of this, the total data size may only be 32K unless either the /C+2 or the /C+L or the /C+A options are used. 6.2) ASM interface the assembly language program must not modify any registers except the scratch registers: 386: EAX,ECX,EDX Parameters are passed on the stack, with the leftmost parameter at the lowest address. In all assembly situations it is convenient to use an index register to index the parameters. The index register must be loaded with the address of the first parameter (which will be the stack pointer + 4 if you don't push the index register, or the stack pointer +8 if you do). Parameters normally take four bytes for the standard data types; however double and long double types take 8 and twelve bytes respectively. If you pass a structure by value the amount of stack space used is dependent on the size of the structure. 6.3) Segmentation The following segments or sections may appear in the output file: 386 name use .CODE Code and string constants .DATA Initialized global data .?DATA Unintialized global data INITDATA #pragma startup links EXITDATA #pragma rundown links CPPDATA C++ static initializations The following switches affect code generation: /C-b combine the BSS with the DATA /C-l don't put line numbers in ASM file /C-m donn't mangle with underscores /C+p pack variables /C+r use reverse order for bit fields. Note that this option reverses the allocation order but does not reverse the value in the field. The following #pragma statements affect code generation: #pragma regopt - enable/disable register allocations #pragma startup - name a routine to be executed on startup #pragma rundown - name a routine to be executed on rundown 6.4) Optimizations the compiler performs the following optimizations: a) Constant folding. When common math is done with constants, the compiler will evaluate the expression and replace it with a constant. b) Reduction in strength multiplies and divides are turned into shifts when appropriate. Mods are turned into ands when appropriate. c) Target optimization When the target for an assignment is known, a temp register will not be allocated, but the target will be used directly. This keeps us from generating dead temp registers that will later have to be optimized out of the icode. d) Dead code elimination Delete jumps to jumps, jumps to the next statement, and dead code. also delete any temporaries that 2) came up with that are now unused. Note that the SETJMP libraries for example will NOT save the state of floating point registers. So there is a switch to disable optimization into floating point registers in case you need to setjump to a routine that uses floating point. Address and data register optimizations can also be turned off. See switches.doc. e) reordering expressions In some cases the compiler generates better code if expressions are reordered; for example: a = a + 10; can be turned into a += 10 and better code gets generated. Also, a lot of work has been put into optimizing usage of based/indexed modes of the processors when it can be done. The present version will even use index register scaling when possible! f) base + index addressing modes This compiler goes to some length to identify when base + index addressing modes may be used to generate an address 7.1) Description of directory tree Sources to this compiler are included in seperate packages. Sources should be generic; that is they should work on any architecture where the byte size is 8 bits. However, I use weird tab settings in my editor. If you want to comprehend the sources get a beautifier of some sort and run the sources through it first; or set your editor tab setting to 2 to see what I see. The directory structure is: CLIBS various sources for runn-time library DOC documentation EXAMP a simple example (there is a more complex one in clibs\startup\test) INCLUDE compiler header files OBJECT compiler make/objeect files SOURCE compiler source files There are two groups of sources: 1) the compiler in the SOURCE, INCLUDE, OBJECT directories 2) libraries you can use in conjunction with the compiler to gennerate programs (target run-time libraries) in the CLIBS directory I often use the set of triple directories: SOURCE OBJECT INCLUDE for a given project so I won't clutter up a single directory with dozens of files. When this triple comes up, sources are inn the SOURCE directory, headers the sources depend on are in the INCLUDE direcorty, and you can expect me to chdir to the OBJECT directory to compile the program... thus you will find the make file there. For these triples you thus have to use an include path which consists of the INCLUDE directory when you compile the source files. proto.bat generates the file INCLUDE\CC.P; which is a protootype file I'm using to keep the compiler honest with me. You shouldn't have to change that unless you make major changes to the sources... but you can edit CC.P directly and put new prototypes in if you want. I often do. I only use proto.bat when I'm making major changes to the compiler. 7.2) Porting This version of the compiler is intended to be portable; one need only rewrite the back end for the given target. This portability probably extends only to processors with a 'byte' architecture. The following symbols have to be defined on the command line: -DPROGNAME="CC386" ; Name of the program whnich will appear in the bannder -DENVNAME="CC386" ; Name of the environment variable to consult for command line parameters -DGLBDEFINE="_i386_" ; Symbol to define in the source; can be used to identify processor-specific needs -DSOURCEXT=".ASM" ; Extnension to use on the output file These definitions are imported by CMAIN.C to define the program environment. I have shown you the definitions used by the 386 compiler; change them as necessary for your target. The following files comprise the 386 backend. They should be all you have to change to port the compiler to a new processor. I suggest you rename them to something else before changing them: an386.c - Register optimization reg386.c - Register allocation for expressions conf386.c - configuration; int sizes and free registers and such outas386.c - outputs ASM code gexpr386.c - turn the expression parse trees into code gstmt386.c - turn the stmt parse trees into code peep386.c - Peephole analysis for this processor For more information on porting contact the author of the code. David Lindauer (gclind01@starbase.spd.louisville.edu)